Goto

Collaborating Authors

 Marathon County


NarraGuide: an LLM-based Narrative Mobile Robot for Remote Place Exploration

Hu, Yaxin, Sato, Arissa J., Du, Jingxin, Ye, Chenming, Zhu, Anjun, Praveena, Pragathi, Mutlu, Bilge

arXiv.org Artificial Intelligence

Robotic telepresence enables users to navigate and experience remote environments. However, effective navigation and situational awareness depend on users' prior knowledge of the environment, limiting the usefulness of these systems for exploring unfamiliar places. We explore how integrating location-aware LLM-based narrative capabilities into a mobile robot can support remote exploration. We developed a prototype system, called NarraGuide, that provides narrative guidance for users to explore and learn about a remote place through a dialogue-based interface. We deployed our prototype in a geology museum, where remote participants (n=20) used the robot to tour the museum. Our findings reveal how users perceived the robot's role, engaged in dialogue in the tour, and expressed preferences for bystander encountering. Our work demonstrates the potential of LLM-enabled robotic capabilities to deliver location-aware narrative guidance and enrich the experience of exploring remote environments.


OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

Fu, Ling, Yang, Biao, Kuang, Zhebin, Song, Jiajun, Li, Yuzhe, Zhu, Linghao, Luo, Qidi, Wang, Xinyu, Lu, Hao, Huang, Mingxin, Li, Zhang, Tang, Guozhi, Shan, Bin, Lin, Chunhui, Liu, Qi, Wu, Binghong, Feng, Hao, Liu, Hao, Huang, Can, Tang, Jingqun, Chen, Wei, Jin, Lianwen, Liu, Yuliang, Bai, Xiang

arXiv.org Artificial Intelligence

Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest recently. Existing benchmarks have highlighted the impressive performance of LMMs in text recognition; however, their abilities on certain challenging tasks, such as text localization, handwritten content extraction, and logical reasoning, remain underexplored. To bridge this gap, we introduce OCRBench v2, a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks (4x more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios (31 diverse scenarios including street scene, receipt, formula, diagram, and so on), and thorough evaluation metrics, with a total of 10,000 human-verified question-answering pairs and a high proportion of difficult samples. After carefully benchmarking state-of-the-art LMMs on OCRBench v2, we find that 20 out of 22 LMMs score below 50 (100 in total) and suffer from five-type limitations, including less frequently encountered text recognition, fine-grained perception, layout perception, complex element parsing, and logical reasoning. The benchmark and evaluation scripts are available at https://github.com/Yuliang-liu/MultimodalOCR.


WavePulse: Real-time Content Analytics of Radio Livestreams

Mittal, Govind, Gupta, Sarthak, Wagle, Shruti, Chopra, Chirag, DeMattee, Anthony J, Memon, Nasir, Ahamad, Mustaque, Hegde, Chinmay

arXiv.org Artificial Intelligence

Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally applicable, we showcase the efficacy of WavePulse in a collaborative project with a team of political scientists focusing on the 2024 Presidential Elections. We use WavePulse to monitor livestreams of 396 news radio stations over a period of three months, processing close to 500,000 hours of audio streams. These streams were converted into time-stamped, diarized transcripts and analyzed to track answer key political science questions at both the national and state levels. Our analysis revealed how local issues interacted with national trends, providing insights into information flow. Our results demonstrate WavePulse's efficacy in capturing and analyzing content from radio livestreams sourced from the Web. Code and dataset can be accessed at \url{https://wave-pulse.io}.


Immunis Appoints Neil Sahota as Chief Artificial Intelligence Officer

#artificialintelligence

Immunis, a private biotechnology company developing a novel treatment for age and disease-related immune decline, has appointed Neil Sahota as its Chief Artificial Intelligence (AI) Officer. AI is a tool that is transforming the way businesses and scientists integrate information, conduct data analysis, and make informed decisions on how to optimize growth. For over 20 years, Neil has inspired AI modernization through technology-based business strategies and has been successful in helping businesses become leaders in the digital future. Immunis is confident that Neil will guide the company to unlock the considerable potential of AI in biotech. AiThority Interview Insights: AiThority Interview with at Brian Sathianathan, Co-Founder and CTO at Iterate.ai Immunis is confident that Neil will guide the company to unlock the considerable potential of AI in biotech.